[InferAlignment] Propagate alignment between loads/stores of the same base pointer #145733

dakersnar · 2025-06-25T16:12:07Z

We can derive and upgrade alignment for loads/stores using other well-aligned loads/stores. This optimization does a single pass through each basic block and uses loads/stores (the alignment and the offset) to derive the best possible alignment for a base pointer, caching the result. If it encounters another load/store based on that pointer, it tries to upgrade the alignment. The optimization must be within a basic block because control flow can impact alignment guarantees.

… improve vectorization

llvmbot · 2025-06-25T16:12:30Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Drew Kersnar (dakersnar)

Changes

At this point in the vectorization pass, we are guaranteed to have a contiguous chain with defined offsets for each element. Using this information, we can derive and upgrade alignment for elements in the chain based on their offset from previous well-aligned elements. This enables vectorization of chains that are longer than the maximum vector length of the target. This algorithm is also robust to the head of the chain not being well-aligned; if we find a better alignment while iterating from the beginning to the end of the chain, we will use that alignment moving forward.

Full diff: https://github.com/llvm/llvm-project/pull/145733.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp (+35)
(added) llvm/test/Transforms/LoadStoreVectorizer/prop-align.ll (+296)

diff --git a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
index 89f63c3b66aad..e14a936b764e5 100644
--- a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -343,6 +343,9 @@ class Vectorizer {
   /// Postcondition: For all i, ret[i][0].second == 0, because the first instr
   /// in the chain is the leader, and an instr touches distance 0 from itself.
   std::vector<Chain> gatherChains(ArrayRef<Instruction *> Instrs);
+
+  /// Propagates the best alignment in a chain of contiguous accesses
+  void propagateBestAlignmentsInChain(ArrayRef<ChainElem> C) const;
 };
 
 class LoadStoreVectorizerLegacyPass : public FunctionPass {
@@ -716,6 +719,14 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) {
   unsigned AS = getLoadStoreAddressSpace(C[0].Inst);
   unsigned VecRegBytes = TTI.getLoadStoreVecRegBitWidth(AS) / 8;
 
+  // We know that the accesses are contiguous. Propagate alignment
+  // information so that slices of the chain can still be vectorized.
+  propagateBestAlignmentsInChain(C);
+  LLVM_DEBUG({
+    dbgs() << "LSV: Chain after alignment propagation:\n";
+    dumpChain(C);
+  });
+
   std::vector<Chain> Ret;
   for (unsigned CBegin = 0; CBegin < C.size(); ++CBegin) {
     // Find candidate chains of size not greater than the largest vector reg.
@@ -1634,3 +1645,27 @@ std::optional<APInt> Vectorizer::getConstantOffset(Value *PtrA, Value *PtrB,
         .sextOrTrunc(OrigBitWidth);
   return std::nullopt;
 }
+
+void Vectorizer::propagateBestAlignmentsInChain(ArrayRef<ChainElem> C) const {
+  ChainElem BestAlignedElem = C[0];
+  Align BestAlignSoFar = getLoadStoreAlignment(C[0].Inst);
+
+  for (const ChainElem &E : C) {
+    Align OrigAlign = getLoadStoreAlignment(E.Inst);
+    if (OrigAlign > BestAlignSoFar) {
+      BestAlignedElem = E;
+      BestAlignSoFar = OrigAlign;
+    }
+
+    APInt OffsetFromBestAlignedElem =
+        E.OffsetFromLeader - BestAlignedElem.OffsetFromLeader;
+    assert(OffsetFromBestAlignedElem.isNonNegative());
+    // commonAlignment is equivalent to a greatest common power-of-two divisor;
+    // it returns the largest power of 2 that divides both A and B.
+    Align NewAlign = commonAlignment(
+        BestAlignSoFar, OffsetFromBestAlignedElem.getLimitedValue());
+    if (NewAlign > OrigAlign)
+      setLoadStoreAlignment(E.Inst, NewAlign);
+  }
+  return;
+}
diff --git a/llvm/test/Transforms/LoadStoreVectorizer/prop-align.ll b/llvm/test/Transforms/LoadStoreVectorizer/prop-align.ll
new file mode 100644
index 0000000000000..a1878dc051d99
--- /dev/null
+++ b/llvm/test/Transforms/LoadStoreVectorizer/prop-align.ll
@@ -0,0 +1,296 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=load-store-vectorizer -S < %s | FileCheck %s
+
+; The IR has the first float3 labeled with align 16, and that 16 should
+; be propagated such that the second set of 4 values
+; can also be vectorized together.
+%struct.float3 = type { float, float, float }
+%struct.S1 = type { %struct.float3, %struct.float3, i32, i32 }
+
+define void @testStore(ptr nocapture writeonly %1) {
+; CHECK-LABEL: define void @testStore(
+; CHECK-SAME: ptr writeonly captures(none) [[TMP0:%.*]]) {
+; CHECK-NEXT:    store <4 x float> zeroinitializer, ptr [[TMP0]], align 16
+; CHECK-NEXT:    [[GETELEM10:%.*]] = getelementptr inbounds [[STRUCT_S1:%.*]], ptr [[TMP0]], i64 0, i32 1, i32 1
+; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[GETELEM10]], align 16
+; CHECK-NEXT:    ret void
+;
+  store float 0.000000e+00, ptr %1, align 16
+  %getElem = getelementptr inbounds %struct.float3, ptr %1, i64 0, i32 1
+  store float 0.000000e+00, ptr %getElem, align 4
+  %getElem8 = getelementptr inbounds %struct.float3, ptr %1, i64 0, i32 2
+  store float 0.000000e+00, ptr %getElem8, align 8
+  %getElem9 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 1
+  store float 0.000000e+00, ptr %getElem9, align 4
+  %getElem10 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 1, i32 1
+  store float 0.000000e+00, ptr %getElem10, align 4
+  %getElem11 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 1, i32 2
+  store float 0.000000e+00, ptr %getElem11, align 4
+  %getElem12 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 2
+  store i32 0, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 3
+  store i32 0, ptr %getElem13, align 4
+  ret void
+}
+
+define void @testLoad(ptr nocapture writeonly %1) {
+; CHECK-LABEL: define void @testLoad(
+; CHECK-SAME: ptr writeonly captures(none) [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, ptr [[TMP0]], align 16
+; CHECK-NEXT:    [[L11:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
+; CHECK-NEXT:    [[L22:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
+; CHECK-NEXT:    [[L33:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
+; CHECK-NEXT:    [[L44:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
+; CHECK-NEXT:    [[GETELEM10:%.*]] = getelementptr inbounds [[STRUCT_S1:%.*]], ptr [[TMP0]], i64 0, i32 1, i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[GETELEM10]], align 16
+; CHECK-NEXT:    [[L55:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32 [[L55]] to float
+; CHECK-NEXT:    [[L66:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
+; CHECK-NEXT:    [[TMP5:%.*]] = bitcast i32 [[L66]] to float
+; CHECK-NEXT:    [[L77:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
+; CHECK-NEXT:    [[L88:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
+; CHECK-NEXT:    ret void
+;
+  %l1 = load float, ptr %1, align 16
+  %getElem = getelementptr inbounds %struct.float3, ptr %1, i64 0, i32 1
+  %l2 = load float, ptr %getElem, align 4
+  %getElem8 = getelementptr inbounds %struct.float3, ptr %1, i64 0, i32 2
+  %l3 = load float, ptr %getElem8, align 8
+  %getElem9 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 1
+  %l4 = load float, ptr %getElem9, align 4
+  %getElem10 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 1, i32 1
+  %l5 = load float, ptr %getElem10, align 4
+  %getElem11 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 1, i32 2
+  %l6 = load float, ptr %getElem11, align 4
+  %getElem12 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 2
+  %l7 = load i32, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds %struct.S1, ptr %1, i64 0, i32 3
+  %l8 = load i32, ptr %getElem13, align 4
+  ret void
+}
+
+; Also, test without the struct geps, to see if it still works with i8 geps/ptradd
+
+define void @testStorei8(ptr nocapture writeonly %1) {
+; CHECK-LABEL: define void @testStorei8(
+; CHECK-SAME: ptr writeonly captures(none) [[TMP0:%.*]]) {
+; CHECK-NEXT:    store <4 x float> zeroinitializer, ptr [[TMP0]], align 16
+; CHECK-NEXT:    [[GETELEM10:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 16
+; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[GETELEM10]], align 16
+; CHECK-NEXT:    ret void
+;
+  store float 0.000000e+00, ptr %1, align 16
+  %getElem = getelementptr inbounds i8, ptr %1, i64 4
+  store float 0.000000e+00, ptr %getElem, align 4
+  %getElem8 = getelementptr inbounds i8, ptr %1, i64 8
+  store float 0.000000e+00, ptr %getElem8, align 8
+  %getElem9 = getelementptr inbounds i8, ptr %1, i64 12
+  store float 0.000000e+00, ptr %getElem9, align 4
+  %getElem10 = getelementptr inbounds i8, ptr %1, i64 16
+  store float 0.000000e+00, ptr %getElem10, align 4
+  %getElem11 = getelementptr inbounds i8, ptr %1, i64 20
+  store float 0.000000e+00, ptr %getElem11, align 4
+  %getElem12 = getelementptr inbounds i8, ptr %1, i64 24
+  store i32 0, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds i8, ptr %1, i64 28
+  store i32 0, ptr %getElem13, align 4
+  ret void
+}
+
+define void @testLoadi8(ptr nocapture writeonly %1) {
+; CHECK-LABEL: define void @testLoadi8(
+; CHECK-SAME: ptr writeonly captures(none) [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, ptr [[TMP0]], align 16
+; CHECK-NEXT:    [[L11:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
+; CHECK-NEXT:    [[L22:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
+; CHECK-NEXT:    [[L33:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
+; CHECK-NEXT:    [[L44:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
+; CHECK-NEXT:    [[GETELEM10:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 16
+; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[GETELEM10]], align 16
+; CHECK-NEXT:    [[L55:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0
+; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32 [[L55]] to float
+; CHECK-NEXT:    [[L66:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
+; CHECK-NEXT:    [[TMP5:%.*]] = bitcast i32 [[L66]] to float
+; CHECK-NEXT:    [[L77:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
+; CHECK-NEXT:    [[L88:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
+; CHECK-NEXT:    ret void
+;
+  %l1 = load float, ptr %1, align 16
+  %getElem = getelementptr inbounds i8, ptr %1, i64 4
+  %l2 = load float, ptr %getElem, align 4
+  %getElem8 = getelementptr inbounds i8, ptr %1, i64 8
+  %l3 = load float, ptr %getElem8, align 8
+  %getElem9 = getelementptr inbounds i8, ptr %1, i64 12
+  %l4 = load float, ptr %getElem9, align 4
+  %getElem10 = getelementptr inbounds i8, ptr %1, i64 16
+  %l5 = load float, ptr %getElem10, align 4
+  %getElem11 = getelementptr inbounds i8, ptr %1, i64 20
+  %l6 = load float, ptr %getElem11, align 4
+  %getElem12 = getelementptr inbounds i8, ptr %1, i64 24
+  %l7 = load i32, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds i8, ptr %1, i64 28
+  %l8 = load i32, ptr %getElem13, align 4
+  ret void
+}
+
+
+; This version of the test adjusts the struct to hold two i32s at the beginning,
+; but still assumes that the first float3 is 16 aligned. If the alignment
+; propagation works correctly, it should be able to load this struct in three
+; loads: a 2x32, a 4x32, and a 4x32. Without the alignment propagation, the last
+; 4x32 will instead be a 2x32 and a 2x32
+%struct.S2 = type { i32, i32, %struct.float3, %struct.float3, i32, i32 }
+
+define void @testStore_2(ptr nocapture writeonly %1) {
+; CHECK-LABEL: define void @testStore_2(
+; CHECK-SAME: ptr writeonly captures(none) [[TMP0:%.*]]) {
+; CHECK-NEXT:    store <2 x i32> zeroinitializer, ptr [[TMP0]], align 8
+; CHECK-NEXT:    [[GETELEM1:%.*]] = getelementptr inbounds [[STRUCT_S2:%.*]], ptr [[TMP0]], i64 0, i32 2
+; CHECK-NEXT:    store <4 x float> zeroinitializer, ptr [[GETELEM1]], align 16
+; CHECK-NEXT:    [[GETELEM10:%.*]] = getelementptr inbounds [[STRUCT_S2]], ptr [[TMP0]], i64 0, i32 3, i32 1
+; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[GETELEM10]], align 16
+; CHECK-NEXT:    ret void
+;
+  store i32 0, ptr %1, align 8
+  %getElem = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 1
+  store i32 0, ptr %getElem, align 4
+  %getElem1 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 2
+  store float 0.000000e+00, ptr %getElem1, align 16
+  %getElem2 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 2, i32 1
+  store float 0.000000e+00, ptr %getElem2, align 4
+  %getElem8 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 2, i32 2
+  store float 0.000000e+00, ptr %getElem8, align 8
+  %getElem9 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 3
+  store float 0.000000e+00, ptr %getElem9, align 4
+  %getElem10 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 3, i32 1
+  store float 0.000000e+00, ptr %getElem10, align 4
+  %getElem11 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 3, i32 2
+  store float 0.000000e+00, ptr %getElem11, align 4
+  %getElem12 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 4
+  store i32 0, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 5
+  store i32 0, ptr %getElem13, align 4
+  ret void
+}
+
+define void @testLoad_2(ptr nocapture writeonly %1) {
+; CHECK-LABEL: define void @testLoad_2(
+; CHECK-SAME: ptr writeonly captures(none) [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[TMP0]], align 8
+; CHECK-NEXT:    [[L1:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
+; CHECK-NEXT:    [[L22:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
+; CHECK-NEXT:    [[GETELEM1:%.*]] = getelementptr inbounds [[STRUCT_S2:%.*]], ptr [[TMP0]], i64 0, i32 2
+; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x float>, ptr [[GETELEM1]], align 16
+; CHECK-NEXT:    [[L33:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
+; CHECK-NEXT:    [[L44:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
+; CHECK-NEXT:    [[L55:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
+; CHECK-NEXT:    [[L66:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
+; CHECK-NEXT:    [[GETELEM10:%.*]] = getelementptr inbounds [[STRUCT_S2]], ptr [[TMP0]], i64 0, i32 3, i32 1
+; CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i32>, ptr [[GETELEM10]], align 16
+; CHECK-NEXT:    [[L77:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
+; CHECK-NEXT:    [[TMP5:%.*]] = bitcast i32 [[L77]] to float
+; CHECK-NEXT:    [[L88:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1
+; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32 [[L88]] to float
+; CHECK-NEXT:    [[L99:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2
+; CHECK-NEXT:    [[L010:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
+; CHECK-NEXT:    ret void
+;
+  %l = load i32, ptr %1, align 8
+  %getElem = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 1
+  %l2 = load i32, ptr %getElem, align 4
+  %getElem1 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 2
+  %l3 = load float, ptr %getElem1, align 16
+  %getElem2 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 2, i32 1
+  %l4 = load float, ptr %getElem2, align 4
+  %getElem8 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 2, i32 2
+  %l5 = load float, ptr %getElem8, align 8
+  %getElem9 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 3
+  %l6 = load float, ptr %getElem9, align 4
+  %getElem10 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 3, i32 1
+  %l7 = load float, ptr %getElem10, align 4
+  %getElem11 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 3, i32 2
+  %l8 = load float, ptr %getElem11, align 4
+  %getElem12 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 4
+  %l9 = load i32, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds %struct.S2, ptr %1, i64 0, i32 5
+  %l0 = load i32, ptr %getElem13, align 4
+  ret void
+}
+
+; Also, test without the struct geps, to see if it still works with i8 geps/ptradd
+
+define void @testStorei8_2(ptr nocapture writeonly %1) {
+; CHECK-LABEL: define void @testStorei8_2(
+; CHECK-SAME: ptr writeonly captures(none) [[TMP0:%.*]]) {
+; CHECK-NEXT:    store <2 x i32> zeroinitializer, ptr [[TMP0]], align 8
+; CHECK-NEXT:    [[GETELEM1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 8
+; CHECK-NEXT:    store <4 x float> zeroinitializer, ptr [[GETELEM1]], align 16
+; CHECK-NEXT:    [[GETELEM10:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 24
+; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[GETELEM10]], align 16
+; CHECK-NEXT:    ret void
+;
+  store i32 0, ptr %1, align 8
+  %getElem = getelementptr inbounds i8, ptr %1, i64 4
+  store i32 0, ptr %getElem, align 4
+  %getElem1 = getelementptr inbounds i8, ptr %1, i64 8
+  store float 0.000000e+00, ptr %getElem1, align 16
+  %getElem2 = getelementptr inbounds i8, ptr %1, i64 12
+  store float 0.000000e+00, ptr %getElem2, align 4
+  %getElem8 = getelementptr inbounds i8, ptr %1, i64 16
+  store float 0.000000e+00, ptr %getElem8, align 8
+  %getElem9 = getelementptr inbounds i8, ptr %1, i64 20
+  store float 0.000000e+00, ptr %getElem9, align 4
+  %getElem10 = getelementptr inbounds i8, ptr %1, i64 24
+  store float 0.000000e+00, ptr %getElem10, align 4
+  %getElem11 = getelementptr inbounds i8, ptr %1, i64 28
+  store float 0.000000e+00, ptr %getElem11, align 4
+  %getElem12 = getelementptr inbounds i8, ptr %1, i64 32
+  store i32 0, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds i8, ptr %1, i64 36
+  store i32 0, ptr %getElem13, align 4
+  ret void
+}
+
+define void @testLoadi8_2(ptr nocapture writeonly %1) {
+; CHECK-LABEL: define void @testLoadi8_2(
+; CHECK-SAME: ptr writeonly captures(none) [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[TMP0]], align 8
+; CHECK-NEXT:    [[L1:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
+; CHECK-NEXT:    [[L22:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
+; CHECK-NEXT:    [[GETELEM1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x float>, ptr [[GETELEM1]], align 16
+; CHECK-NEXT:    [[L33:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
+; CHECK-NEXT:    [[L44:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
+; CHECK-NEXT:    [[L55:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
+; CHECK-NEXT:    [[L66:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
+; CHECK-NEXT:    [[GETELEM10:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 24
+; CHECK-NEXT:    [[TMP4:%.*]] = load <4 x i32>, ptr [[GETELEM10]], align 16
+; CHECK-NEXT:    [[L77:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
+; CHECK-NEXT:    [[TMP5:%.*]] = bitcast i32 [[L77]] to float
+; CHECK-NEXT:    [[L88:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1
+; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32 [[L88]] to float
+; CHECK-NEXT:    [[L99:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2
+; CHECK-NEXT:    [[L010:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
+; CHECK-NEXT:    ret void
+;
+  %l = load i32, ptr %1, align 8
+  %getElem = getelementptr inbounds i8, ptr %1, i64 4
+  %l2 = load i32, ptr %getElem, align 4
+  %getElem1 = getelementptr inbounds i8, ptr %1, i64 8
+  %l3 = load float, ptr %getElem1, align 16
+  %getElem2 = getelementptr inbounds i8, ptr %1, i64 12
+  %l4 = load float, ptr %getElem2, align 4
+  %getElem8 = getelementptr inbounds i8, ptr %1, i64 16
+  %l5 = load float, ptr %getElem8, align 8
+  %getElem9 = getelementptr inbounds i8, ptr %1, i64 20
+  %l6 = load float, ptr %getElem9, align 4
+  %getElem10 = getelementptr inbounds i8, ptr %1, i64 24
+  %l7 = load float, ptr %getElem10, align 4
+  %getElem11 = getelementptr inbounds i8, ptr %1, i64 28
+  %l8 = load float, ptr %getElem11, align 4
+  %getElem12 = getelementptr inbounds i8, ptr %1, i64 32
+  %l9 = load i32, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds i8, ptr %1, i64 36
+  %l0 = load i32, ptr %getElem13, align 4
+  ret void
+}

AlexMaclean

Nice

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

…ests

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

AlexMaclean

LGTM, but please wait for review by someone with more LSV experience before landing.

dakersnar · 2025-06-27T22:37:01Z

Adding @michalpaszkowski to reviewers since I saw you reviewed a recently merged LSV change. If you can think of anyone else who would be a better fit to review this let me know :)

arsenm · 2025-07-02T02:29:13Z

llvm/test/Transforms/LoadStoreVectorizer/prop-align.ll

+  %l7 = load i32, ptr %getElem12, align 8
+  %getElem13 = getelementptr inbounds i8, ptr %1, i64 28
+  %l8 = load i32, ptr %getElem13, align 4
+  ret void


Can you use more representative tests with some uses? As is these all optimize to memset or no-op

Slight pushback, my understanding is that lit tests are most useful when they are minimal reproducers of the problem the optimization is targeting. Adding uses would not really change the nature of this optimization. Tests like llvm/test/Transforms/LoadStoreVectorizer/NVPTX/vectorize_i1.ll follow this thinking.

If you think it would be better, I could combine each pair of load and store tests into individual tests, storing the result of the loads. Other LSV tests use that pattern a lot.

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

arsenm · 2025-07-02T02:30:59Z

llvm/test/Transforms/LoadStoreVectorizer/prop-align.ll

+; The IR has the first float3 labeled with align 16, and that 16 should
+; be propagated such that the second set of 4 values
+; can also be vectorized together.


Is this papering over a missed middle end optimization? The middle end should have inferred the pointer argument to align 16, and then each successive access should already have a refined alignment derived from that

I think this is nice to have for a variety of edge cases, but the specific motivator for this change is based on InstCombine's handling of nested structs. You are correct that if loads/stores are unpacked from aligned loads/stores of aggregates, alignment will be propagated to the unpacked elements. However, with nested structs, InstCombine unpacks one layer at a time, losing alignment context in between passes over the worklist.

Before IC:

; RUN: opt < %s -passes=instcombine -S | FileCheck %s %struct.S1 = type { %struct.float3, %struct.float3, i32, i32 } %struct.float3 = type { float, float, float } define void @_Z7init_s1P2S1S0_(ptr noundef %v, ptr noundef %vout) { %v.addr = alloca ptr, align 8 %vout.addr = alloca ptr, align 8 store ptr %v, ptr %v.addr, align 8 store ptr %vout, ptr %vout.addr, align 8 %tmp = load ptr, ptr %vout.addr, align 8 %tmp1 = load ptr, ptr %v.addr, align 8 %1 = load %struct.S1, ptr %tmp1, align 16 store %struct.S1 %1, ptr %tmp, align 16 ret void }

After IC:

define void @_Z7init_s1P2S1S0_(ptr noundef %v, ptr noundef %vout) { %.unpack.unpack = load float, ptr %v, align 16 %.unpack.elt7 = getelementptr inbounds nuw i8, ptr %v, i64 4 %.unpack.unpack8 = load float, ptr %.unpack.elt7, align 4 %.unpack.elt9 = getelementptr inbounds nuw i8, ptr %v, i64 8 %.unpack.unpack10 = load float, ptr %.unpack.elt9, align 8 %.elt1 = getelementptr inbounds nuw i8, ptr %v, i64 12 %.unpack2.unpack = load float, ptr %.elt1, align 4 %.unpack2.elt12 = getelementptr inbounds nuw i8, ptr %v, i64 16 %.unpack2.unpack13 = load float, ptr %.unpack2.elt12, align 4 ; <----------- this should be align 16 %.unpack2.elt14 = getelementptr inbounds nuw i8, ptr %v, i64 20 %.unpack2.unpack15 = load float, ptr %.unpack2.elt14, align 4 %.elt3 = getelementptr inbounds nuw i8, ptr %v, i64 24 %.unpack4 = load i32, ptr %.elt3, align 8 %.elt5 = getelementptr inbounds nuw i8, ptr %v, i64 28 %.unpack6 = load i32, ptr %.elt5, align 4 store float %.unpack.unpack, ptr %vout, align 16 %vout.repack23 = getelementptr inbounds nuw i8, ptr %vout, i64 4 store float %.unpack.unpack8, ptr %vout.repack23, align 4 %vout.repack25 = getelementptr inbounds nuw i8, ptr %vout, i64 8 store float %.unpack.unpack10, ptr %vout.repack25, align 8 %vout.repack17 = getelementptr inbounds nuw i8, ptr %vout, i64 12 store float %.unpack2.unpack, ptr %vout.repack17, align 4 %vout.repack17.repack27 = getelementptr inbounds nuw i8, ptr %vout, i64 16 store float %.unpack2.unpack13, ptr %vout.repack17.repack27, align 4 %vout.repack17.repack29 = getelementptr inbounds nuw i8, ptr %vout, i64 20 store float %.unpack2.unpack15, ptr %vout.repack17.repack29, align 4 %vout.repack19 = getelementptr inbounds nuw i8, ptr %vout, i64 24 store i32 %.unpack4, ptr %vout.repack19, align 8 %vout.repack21 = getelementptr inbounds nuw i8, ptr %vout, i64 28 store i32 %.unpack6, ptr %vout.repack21, align 4 ret void }

To visualize what's happening under the hood, InstCombine is unpacking the load in stages like this:
%1 = load %struct.S1, ptr %tmp1, align 16
->
load struct.float3 align 16
load struct.float3 align 4
load i32 align 8
load i32 align 4
->
load float align 16
load float align 4
load float align 8
load float align 4
load float align 4
load float align 4
load i32 align 8
load i32 align 4

If the instcombine is fixed to propagate the alignment properly, will this patch still be useful/needed? If not, then I would agree with Matt that we should try fixing the source of the problem.

This instcombine case is the only one I currently know of those benefits from this specific optimization, but I would suspect there could be a different way to arrive at a pattern like this.

But let's ignore those hypotheticals, because I understand the desire to stick within the realm of known use cases. The question is, are we comfortable changing the unpackLoadToAggregate algorithm in InstCombine to recurse through nested structs?

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Lines 788 to 801 in 8c32f95

for (uint64_t i = 0; i < NumElements; i++) {

Value *Indices[2] = {

Zero,

ConstantInt::get(IdxType, i),

};

auto *Ptr = IC.Builder.CreateInBoundsGEP(AT, Addr, ArrayRef(Indices),

Name + ".elt");

auto EltAlign = commonAlignment(Align, Offset.getKnownMinValue());

auto *L = IC.Builder.CreateAlignedLoad(AT->getElementType(), Ptr,

EltAlign, Name + ".unpack");

L->setAAMetadata(LI.getAAMetadata());

V = IC.Builder.CreateInsertValue(V, L, i);

Offset += EltSize;

}

I feel like implementing that would be a bit of a mess, and leads to questions like "how many layers deep should it recurse"? Unpacking one layer at a time and then adding the nested struct elements to the IC worklist to be independently operated on in a later iteration is a much cleaner solution and feels more in line with the design philosophy of InstCombine. I could be wrong though, open to feedback or challenges to my assumptions.

And just to be clear, in case my previous explanation was confusing, the reason it would have to recurse through all layers at once is because we cannot store the knowledge that "the second element of load struct.float3 align 4 is aligned to 16" on the instruction; there isn't a syntax available to express that.

Edit: We would also have to change unpackStoreToAggregate to do this too, further increasing the code complexity

This has the align 16 with opt -attributor-enable=module -O3. The default attribute passes do a worse job it seems

Are there any existing passes in particular that would benefit from this upgraded alignment?

Isel benefits from better alignment information in general.

I think the compile time tradeoffs that come with implementing it in InstCombine or InferAlignments would be not worth it. For InstCombine specifically, we have a lot of practical issues with the compile time stress it puts on our compiler at NVIDIA, so we tend to avoid adding complexity to it when possible.

We should definitely not infer better alignments in InstCombine. We specifically split out the InferAlignments pass out of InstCombine to reduce compile-time from repeatedly inferring alignment.

I think with the scheme I proposed above doing this in InferAlignment should be close to free, but of course we won't know until we try.

The problem is that I think if we want to avoid those expensive calls and limit the analysis to "shallow" geps that offset directly off of aligned base pointers, we will miss a lot of cases. There are transformations like StraightLineStrengthReduce that build chains of geps that increase the instruction depth needed to find the underlying object.

I'm not sure we'll be losing much in practice. I'd generally expect that after CSE, cases where alignment can be increased in this manner will have the form of constant offset GEPs from a common base. All of the tests you added in this PR are of this form. Did you see motivating cases where this does not hold?

We should definitely not infer better alignments in InstCombine. We specifically split out the InferAlignments pass out of InstCombine to reduce compile-time from repeatedly inferring alignment.

I agree. There is technically a solution that involves expanding the aggregate unpacking logic to traverse multiple layers at once, which would theoretically solve this problem for "free", but I think there are other tradeoffs that come with that.

I'm not sure we'll be losing much in practice. I'd generally expect that after CSE, cases where alignment can be increased in this manner will have the form of constant offset GEPs from a common base.

This is a reasonable argument. Let me investigate this possibility and get back to you all with my findings.

This has the align 16 with opt -attributor-enable=module -O3. The default attribute passes do a worse job it seems

Sorry, can you clarify what "this" is? What test/command are you running?

That is the command, opt -attributor-enable=module -O3

Thanks, I see what you mean now. That Attributor pass (which does not run by default) labels the argument pointer with the align 16 attribute, and then later InferAlignment uses computeKnownBits (which sees that attribute) to upgrade the load/store.

I don't know much about the history of the Attributor pass. Is that something we should consider enabling by default? Or maybe adding the alignment attribute to the argument at some other point in the pipeline, such as in InstCombine when the aggregate is split? Edit: or maybe an addition to InferAlignments that attempts to add param attributes before the computeKnownBits pass?

Also, it's looking like the InferAlignments change is working. I'll put it up soon for review soon, and we can weigh it against a possible Attributor fix.

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

arsenm

I still have the feeling this is beyond the scope of what the vectorizer should be doing. Attributor / functionattrs + instcombine should be dealing with this

dakersnar · 2025-07-10T22:11:37Z

This has the align 16 with opt -attributor-enable=module -O3. The default attribute passes do a worse job it seems

Attributor / functionattrs

@arsenm Just in case my message gets buried in the other thread, I'll put it here too: I'd love to look closer at this, can you share details about your repro/commands? Thank you :)

dakersnar · 2025-07-18T00:45:04Z

Hi all, still investigating the InferAlignment change. While I'm doing so, I wanted to start a conversation for a different future change I have in mind.

Let's say we had an alignment-upgrading analysis that used scalar evolution to compute alignment. It was even more self-contained than this alignment propagation optimization, in that it didn't take advantage of the LSV's data structures, it was just a black box that took in a single load/store and output a proven better alignment.

Theoretically, this would make perfect sense as an addition to InferAlignment, but here's the catch: what if it is found to be very expensive (compile time)? In that situation, would it be worthwhile to place the optimization inside the LSV, to be run at the moment before a given chain was discarded for being misaligned, essentially as a "last ditch effort" to upgrade alignment with an expensive analysis?

Or would it still make sense for it to live in InferAlignment and run on every load/store, following the feedback on this PR?

I'm happy to have this conversation in more detail once I put the referenced change up for review, as well as try to collect some more tangible compile-time data for this hypothetical tradeoff, but I wanted to get some initial thoughts before I refactor it in one direction or another.

arsenm · 2025-07-18T01:27:41Z

Let's say we had an alignment-upgrading analysis that used scalar evolution to compute alignment.

I'm not sure how ScalarEvolution would help you with this problem. We want static alignments applied to memory instructions and source values, not analyzing the range of possible runtime alignment values

nikic · 2025-07-18T08:42:04Z

We have a SCEV based alignment propagation pass, but it's only used for alignment assumptions, see https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/AlignmentFromAssumptions.cpp.

At one point, I did try using SCEV to infer alignments for everything, but it was very expensive.

InferAlignments currenty has a weakness, in that it (and basically all other alignment helpers) are based on computeKnownBits(), which does not (meaningfully) look through phi nodes, so it cannot always infer perfect alignments for loops.

The way I would fix that wouldn't be via SCEV though, but by using an actual dataflow analysis to propagate alignments, instead of using the recursive known bits analysis. That should provide the most accurate information with the best performance.

This would take a bit more implementation effort though and hasn't really come up as a problem in the past, so nobody bothered doing something like this yet...

llvm/lib/Transforms/Scalar/InferAlignment.cpp

llvm/test/Transforms/InferAlignment/propagate-from-other-load-stores.ll

dakersnar · 2025-07-22T16:19:39Z

I just pushed the InferAlignment solution. I'm currently looking into an alternative attributor/functionattrs solution too, if we think we prefer that.

Artem-B · 2025-07-22T22:28:19Z

The pull request title/subject should be updated to reflect the new approach, too.

Artem-B · 2025-07-22T22:40:38Z

llvm/lib/Transforms/Scalar/InferAlignment.cpp

+      Changed |= tryToImproveAlign(
+          DL, &I, [&](Value *PtrOp, Align LoadStoreAlign, Align PrefAlign) {


As we're iterating over instructions in the BB, we're applying alignment in one direction only. E.g. if we only have alignment info on the last instruction in the BB, it will not benefit any other instructions in the block.

Considering that we're no longer operating on a single load/store chain, we may need to run the loop twice. First one will collect the best known alignments for all base pointers we encounter, and the second one will apply the best alignment to loads/stores.

Backwards propagation is more complicated because it has to handle implicit control flow. I doubt that doing this would be worthwhile in practice.

I agree with nikic. The most common use cases of this optimization will have the best alignment earlier in a basic block because split up aggregates will be best aligned at the first element.

dakersnar · 2025-07-23T15:41:27Z

Just a heads up, I'm going to be away from internet until the 30th. Sorry to leave you all hanging! Thanks so much for the feedback so far this is super helpful.

nikic

LGTM

llvm/lib/Transforms/Scalar/InferAlignment.cpp

Co-authored-by: Nikita Popov <[email protected]>

nikic

It looks like some clang tests need updating.

It also occurred to me that it would be good to check that negative GEP offsets are handled correctly.

dakersnar · 2025-08-05T16:10:29Z

Good call, they were not being handled correctly. Taking the abs of the offsets before calling getLimitedValue should do the trick as far as I can reason.

nikic · 2025-08-06T07:58:37Z

Are you sure the abs is needed? It looks like your tests also passed before that change. I think this ends up being correct because an offset like -8 is something like ...11111000 in binary, which has the same effect for the common alignment as using ...00001000.

dakersnar · 2025-08-06T16:30:09Z

I think you're right. Since the only thing that really matters for commonAlignment is the number of trailing zeros of each input, we can generalize whether we need the abs to the following statement: "will a number and it's two's compliment always have the same number of trailing zeros?". And the answer is yes, because negating will turn the least significant 1 followed by trailing 0s into a 0 followed by trailing 1s, and then adding 1 will revert those bits back to the original pattern, leaving the same number of trailing zeros.

Is there any merit to keeping the abs for readability, so future maintainers don't need to reason through this? Maybe just a comment?

nikic · 2025-08-06T18:27:01Z

Is there any merit to keeping the abs for readability, so future maintainers don't need to reason through this? Maybe just a comment?

I think it's ok to drop as long as there's test coverage, but no strong opinion either way.

preames · 2025-08-07T15:16:12Z

llvm/lib/Transforms/Scalar/InferAlignment.cpp

+  };
+
  for (BasicBlock &BB : F) {
+    // We need to reset the map for each block because alignment information


FYI, the problem here is worse than described. The code as written appears to be correct, I'm just pointing out a conceptual problem which may need reflecting in comments, etc..

Consider this:

%v = load i8, ptr %p, align 4 call void @throw_if_unaligned(%p, 16) store i8 %v, ptr %p, align 16

Propagating the alignment forward is sound, but propagating it backwards (over the possibly throwing call) is not.

Good call out, thank you for pointing that out! This tracks with my understanding of why a backwards propagation worked in the LSV but doesn't work here: the LSV analyzes within the scope of what it calls a "pseudo basic block", which is defined as follows:

/// Runs the vectorizer on a "pseudo basic block", which is a range of /// instructions [Begin, End) within one BB all of which have /// isGuaranteedToTransferExecutionToSuccessor(I) == true.

Anyway, I can adjust the comment to call out your example. And just to confirm, the hypothetical dominator tree approach described in my comment would still be correct, right?

I updated the comment. Let me know if it looks accurate.

…mization

dakersnar · 2025-08-07T21:04:58Z

I'll merge tomorrow morning, unless other reviewers have any hesitations.

…cs (#152575) Previously, the alignment of pointer arithmetics was inferred from the pointee type, losing the alignment information from its operands: https://github.com/llvm/llvm-project/blob/503c0908c3450d228debd64baecf41df8f58476e/clang/lib/CodeGen/CGExpr.cpp#L1446-L1449 This patch preserves alignment information for pointer arithmetics `P +/- C`, to match the behavior of identical array subscript `&P[C]`: https://godbolt.org/z/xx1hfTrx4. Closes #152330. Although the motivating case can be fixed by #145733, the alignment cannot be recovered without a dominating memory access with larger alignment.

…r arithmetics (#152575) Previously, the alignment of pointer arithmetics was inferred from the pointee type, losing the alignment information from its operands: https://github.com/llvm/llvm-project/blob/503c0908c3450d228debd64baecf41df8f58476e/clang/lib/CodeGen/CGExpr.cpp#L1446-L1449 This patch preserves alignment information for pointer arithmetics `P +/- C`, to match the behavior of identical array subscript `&P[C]`: https://godbolt.org/z/xx1hfTrx4. Closes llvm/llvm-project#152330. Although the motivating case can be fixed by llvm/llvm-project#145733, the alignment cannot be recovered without a dominating memory access with larger alignment.

[LoadStoreVectorizer] Propagate alignment through contiguous chain to…

15600f9

… improve vectorization

dakersnar requested review from AlexMaclean and jlebar June 25, 2025 16:12

dakersnar self-assigned this Jun 25, 2025

dakersnar added the vectorizers label Jun 25, 2025

llvmbot added the llvm:transforms label Jun 25, 2025

AlexMaclean reviewed Jun 25, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp Outdated Show resolved Hide resolved

dakersnar added 2 commits June 25, 2025 17:48

Address feedback, add reverse propagation, simplify and expand unit t…

b905f1c

…ests

For consistency, remove the delaying of load/store alignment upgrading

cd9e9f6

dakersnar commented Jun 25, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp Show resolved Hide resolved

AlexMaclean approved these changes Jun 25, 2025

View reviewed changes

dakersnar requested review from arsenm and michalpaszkowski June 27, 2025 22:37

arsenm reviewed Jul 2, 2025

View reviewed changes

dakersnar requested a review from Artem-B July 7, 2025 19:56

arsenm reviewed Jul 10, 2025

View reviewed changes

Refactor algorithm

9c99e0a

dakersnar added 2 commits July 22, 2025 16:08

Alternate approach to solve the problem in InferAlignment

8db304a

Revert original solution

d696eee

nikic reviewed Jul 22, 2025

View reviewed changes

dakersnar added 3 commits July 22, 2025 16:56

Reviewer feedback

c53e595

Reviewer feedback

39dc009

Test improvement

2f73c04

Add comments

3e55c80

Artem-B reviewed Jul 22, 2025

View reviewed changes

dakersnar changed the title ~~[LoadStoreVectorizer] Propagate alignment through contiguous chain~~ [InferAlignment] Propagate alignment between loads/stores of the same base pointer Jul 30, 2025

One call to tryToImproveAlign

1ef4008

nikic approved these changes Aug 1, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/InferAlignment.cpp Outdated Show resolved Hide resolved

dakersnar and others added 2 commits August 1, 2025 09:32

Update InferAlignment.cpp

622cf48

Co-authored-by: Nikita Popov <[email protected]>

Merge branch 'main' into github/dkersnar/lsv-alignment-propagation

4a05b2e

nikic reviewed Aug 4, 2025

View reviewed changes

dakersnar added 2 commits August 4, 2025 17:47

Fix clang tests (all improvements), add negative gep tests

5d15526

Fix negative offsets calculation

8d0d3fc

nikic mentioned this pull request Aug 6, 2025

[RISCV] Failure to use rev8 and 4 byte stores. Possibly due to alignment tracking #152330

Closed

Remove abs, add a couple more tests

c6eb67a

preames reviewed Aug 7, 2025

View reviewed changes

Adjust comment to more correctly clarify the limitations of this opti…

215d65f

…mization

dtcxzyw mentioned this pull request Aug 7, 2025

[Clang][CodeGen] Preserve alignment information for pointer arithmetics #152575

Merged

nikic approved these changes Aug 7, 2025

View reviewed changes

Merge branch 'main' into github/dkersnar/lsv-alignment-propagation

0025877

dakersnar merged commit 90e8c8e into llvm:main Aug 8, 2025
9 checks passed

	for (uint64_t i = 0; i < NumElements; i++) {
	Value *Indices[2] = {
	Zero,
	ConstantInt::get(IdxType, i),
	};
	auto *Ptr = IC.Builder.CreateInBoundsGEP(AT, Addr, ArrayRef(Indices),
	Name + ".elt");
	auto EltAlign = commonAlignment(Align, Offset.getKnownMinValue());
	auto *L = IC.Builder.CreateAlignedLoad(AT->getElementType(), Ptr,
	EltAlign, Name + ".unpack");
	L->setAAMetadata(LI.getAAMetadata());
	V = IC.Builder.CreateInsertValue(V, L, i);
	Offset += EltSize;
	}

		Changed \|= tryToImproveAlign(
		DL, &I, [&](Value *PtrOp, Align LoadStoreAlign, Align PrefAlign) {

[InferAlignment] Propagate alignment between loads/stores of the same base pointer #145733

[InferAlignment] Propagate alignment between loads/stores of the same base pointer #145733

Uh oh!

Conversation

dakersnar commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

dakersnar commented Jun 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakersnar Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakersnar Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

dakersnar commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dakersnar commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Jul 18, 2025

Uh oh!

nikic commented Jul 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dakersnar commented Jul 22, 2025

Uh oh!

Artem-B commented Jul 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakersnar commented Jun 25, 2025 •

edited

Loading

llvmbot commented Jun 25, 2025 •

edited

Loading

dakersnar Jul 9, 2025 •

edited

Loading

dakersnar Jul 21, 2025 •

edited

Loading

dakersnar commented Jul 10, 2025 •

edited

Loading

dakersnar commented Jul 18, 2025 •

edited

Loading